Reinterpreting Importance-weighted Autoencoders

نویسندگان

Chris Cremer

Quaid Morris

David Duvenaud

چکیده

The standard interpretation of importance-weighted autoencoders is that they maximize a tighter lower bound on the marginal likelihood than the standard evidence lower bound. We give an alternate interpretation of this procedure: that it optimizes the standard variational lower bound, but using a more complex distribution. We formally derive this result, present a tighter lower bound, and visualize the implicit importance-weighted distribution. 1 BACKGROUND The importance-weighted autoencoder (IWAE; Burda et al. (2016)) is a variational inference strategy capable of producing arbitrarily tight evidence lower bounds. IWAE maximizes the following multisample evidence lower bound (ELBO): log p(x) ≥ Ez1...zk∼q(z|x) [ log ( 1 k k ∑ i=1 p(x, zi) q(zi|x) )] = LIWAE [q] (IWAE ELBO) which is a tighter lower bound than the ELBO maximized by the variational autoencoder (VAE; Kingma & Welling (2014)): log p(x) ≥ Ez∼q(z|x) [ log ( p(x, z) q(z|x) )] = LV AE [q]. (VAE ELBO) 2 DEFINING THE IMPLICIT DISTRIBUTION q̃IW In this section, we derive the implicit distribution that arises from importance sampling from a distribution p using q as a proposal distribution. Given a batch of samples z2...zk from q(z|x), the following is the unnormalized importance-weighted distribution: q̃IW (z|x, z2:k) = p(x,z) q(z|x) 1 k ∑k j=1 p(x,zj) q(zj |x) q(z|x) = p(x, z) 1 k ( p(x,z) q(z|x) + ∑k j=2 p(x,zj) q(zj |x) ) (1) True posterior k = 1 k = 10 k = 100 Figure 1: Approximations to a complex true distribution, defined via qEW . As k grows, this approximation approaches the true distribution. 1 ar X iv :1 70 4. 02 91 6v 2 [ st at .M L ] 1 5 A ug 2 01 7 Workshop track ICLR 2017 Here are some properties of the approximate IWAE posterior: • When k = 1, q̃IW (z|x, z2:k) equals q(z|x). • When k > 1, the form of q̃IW (z|x, z2:k) depends on the true posterior p(z|x). • As k →∞, Ez2...zk [q̃IW (z|x, z2:k)] approaches the true posterior p(z|x) pointwise. See the appendix for details. Importantly, q̃IW (z|x, z2:k) is dependent on the batch of samples z2...zk. See Fig. 3 in the appendix for a visualization of q̃IW with different batches of z2...zk. 2.1 RECOVERING THE IWAE BOUND FROM THE VAE BOUND Here we show that the IWAE ELBO is equivalent to the VAE ELBO in expectation, but with a more flexible, unnormalized q̃IW distribution, implicitly defined by importance reweighting. If we replace q(z|x) with q̃IW (z|x, z2:k) and take an expectation over z2 . . . zk, then we recover the IWAE ELBO: Ez2...zk∼q(z|x) [LV AE [q̃IW (z|z2:k)]] = Ez2...zk∼q(z|x) [∫ z q̃IW (z|z2:k) log ( p(x, z) q̃IW (z|x, z2:k) ) dz ] = Ez2...zk∼q(z|x) [∫ z q̃IW (z|z2:k) log ( 1 k k ∑ i=1 p(x, zi) q(zi|x) ) dz ] = Ez1...zk∼q(z|x) [ log ( 1 k k ∑ i=1 p(x, zi) q(zi|x) )] = LIWAE [q] For a more detailed derivation, see the appendix. Note that we are abusing the VAE lower bound notation because this implies an expectation over an unnormalized distribution. Consequently, we replace the expectation with an equivalent integral. 2.2 EXPECTED IMPORTANCE WEIGHTED DISTRIBUTION qEW We can achieve a tighter lower bound than LIWAE [q] by taking the expectation over z2...zk of q̃IW . The expected importance-weighted distribution qEW (z|x) is a distribution given by: qEW (z|x) = Ez2...zk∼q(z|x) [q̃IW (z|x, z2:k)] = Ez2...zk∼q(z|x)  p(x, z) 1 k ( p(x,z) q(z|x) + ∑k j=2 p(x,zj) q(zj |x) )  (2) See section 5.2 for a proof that qEW is a normalized distribution. Using qEW in the VAE ELBO, LV AE [qEW ], results in an upper bound of LIWAE [q]. See section 5.3 for the proof, which is a special case of the proof in Naesseth et al. (2017). The procedure to sample from qEW (z|x) is shown in Algorithm 1. It is equivalent to sampling-importance-resampling (SIR). 2.3 VISUALIZING THE NONPARAMETERIC APPROXIMATE POSTERIOR The IWAE approximating distribution is nonparametric in the sense that, as the true posterior grows more complex, so does the shape of q̃IW and qEW . This makes plotting these distributions challenging. A kernel-density-estimation approach could be used, but requires many samples. Thankfully, equations (1) and (2) give us a simple and fast way to approximately plot q̃IW and qEW without introducing artifacts due to kernel density smoothing. Figure 1 visualizes qEW on a 2D distribution approximation problem using Algorithm 2. The base distribution q is a Gaussian. As we increase the number of samples k and keep the base distribution fixed, we see that the approximation approaches the true distribution. See section 5.6 for 1D visualizations of q̃IW and qEW with k = 2. 3 RESAMPLING FOR PREDICTION During training, we sample the q distribution and implicitly weight them with the IWAE ELBO. After training, we need to explicitly reweight samples from q.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Debiasing Evidence Approximations: on Importance-weighted Autoencoders and Jackknife Variational Inference

The importance-weighted autoencoder (IWAE) approach of Burda et al. (2015) defines a sequence of increasingly tighter bounds on the marginal likelihood of latent variable models. Recently, Cremer et al. (2017) reinterpreted the IWAE bounds as ordinary variational evidence lower bounds (ELBO) applied to increasingly accurate variational distributions. In this work, we provide yet another perspec...

متن کامل

Debiasing Evidence Approximations: on Importance-weighted Autoencoders

متن کامل

Debiasing Evidence Approximations: on Importance-weighted Autoencoders

متن کامل

Importance Weighted Autoencoders

The variational autoencoder (VAE; Kingma & Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It typically makes strong assumptions about posterior inference, for instance that the posterior distribution is approximately factorial, and that its parameters can be approximated w...

متن کامل

Denoising Criterion for Variational Auto-Encoding Framework

Denoising autoencoders (DAE) are trained to reconstruct their clean inputs with noise injected at the input level, while variational autoencoders (VAE) are trained with noise injected in their stochastic hidden layer, with a regularizer that encourages this noise injection. In this paper, we show that injecting noise both in input and in the stochastic hidden layer can be advantageous and we pr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Reinterpreting Importance-weighted Autoencoders

نویسندگان

چکیده

منابع مشابه

Debiasing Evidence Approximations: on Importance-weighted Autoencoders and Jackknife Variational Inference

Debiasing Evidence Approximations: on Importance-weighted Autoencoders

Debiasing Evidence Approximations: on Importance-weighted Autoencoders

Importance Weighted Autoencoders

Denoising Criterion for Variational Auto-Encoding Framework

عنوان ژورنال:

اشتراک گذاری